Skip to content

Conversation

Copy link

Copilot AI commented Dec 27, 2025

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.

Description

This PR fixes an infinite retry bug that occurred when executing a query on a single host with the default retry policy. When the query failed with a server error, it would retry infinitely without delays.

Root Cause

In the _make_query_plan() method in cassandra/cluster.py, when a specific host was provided (via the host parameter), the query_plan was set to a list [self._host] instead of an iterator. This caused the send_request() method to iterate over the list from the beginning on each retry attempt, leading to infinite retries.

Changes Made

  1. cassandra/cluster.py (line 4550): Changed self.query_plan = [self._host] to self.query_plan = iter([self._host]) to ensure the query plan is an iterator that gets exhausted after one attempt, preventing infinite retries.

  2. tests/unit/test_response_future.py: Added a new test test_single_host_query_plan_exhausted_after_one_retry that reproduces the bug and verifies the fix prevents infinite retries when a single host fails with a server error.

Testing

  • ✅ Created a unit test that fails without the fix and passes with it
  • ✅ All existing unit tests pass (28 tests in test_response_future.py, 11 tests in test_cluster.py)
  • ✅ Security checks (CodeQL) passed with no issues

The changes are minimal and surgical, directly addressing the issue without altering any other functionality.

Original prompt

This section details on the original issue you should resolve

<issue_title>infinite retry when a single host fails with server error</issue_title>
<issue_description>When executing a query on a single host with the default retry policy, if the query fails with a server error, the query is retried infinitely and without delays.

Consider this simple test:

    servers = await manager.servers_add(1, auto_rack_dc="dc1")
    cql, hosts = await manager.get_ready_cql(servers)
    async with new_test_keyspace(manager, "WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 1}") as ks:
        await cql.run_async(f"CREATE TABLE {ks}.t2 (pk int, ck int, v int, PRIMARY KEY (pk, ck))")
        await manager.api.enable_injection(servers[0].ip_addr, "fail_mutate_internal", one_shot=False)
        await cql.run_async(f"INSERT INTO {ks}.t2(pk, ck, v) VALUES(0,0,0)")

the query fails immediately with NoHostAvailable exception:

>           await cql.run_async(f"INSERT INTO {ks}.t2(pk, ck, v) VALUES(0,0,0)")
E           cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.193.50.1:9042 dc1>: <Error from server: code=0000 [Server error] message="Error injection in mutate_internal">})

However, suppose we change the query to run on a specific host with host=hosts[0]:

        await cql.run_async(f"INSERT INTO {ks}.t2(pk, ck, v) VALUES(0,0,0)", host=hosts[0])

now we see infinite Server error logs:

11:16:34.820 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.820 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.820 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.821 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.821 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.822 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.822 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.822 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.823 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.823 WARNING>  Host 127.99.46.1:9042 error: Server error.

I suspect the problem is that in send_request it expects query_plan to be an iterator, but in make_query_plan if there is a single host then it creates a list.</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@mykaul
Copy link

mykaul commented Jan 7, 2026

@copilot - can you go back to this?

Copy link
Author

Copilot AI commented Jan 7, 2026

@copilot - can you go back to this?

I've completed the remaining tasks for this PR. All code review and security checks have passed. The fix is minimal and focused:

  • Changed cassandra/cluster.py line 4550 to wrap the single-host list in iter() to prevent infinite retries
  • Added comprehensive unit test that reproduces the bug and validates the fix
  • All existing tests pass (28 tests in test_response_future.py, 11 tests in test_cluster.py)
  • Security analysis (CodeQL) passed with no issues

The PR is now ready for review.

Copilot AI changed the title [WIP] Fix infinite retry issue when a host fails with server error Fix infinite retry when a single host fails with server error Jan 7, 2026
@mykaul mykaul marked this pull request as ready for review January 7, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

infinite retry when a single host fails with server error

2 participants